Search Result

Journals

Publication Years

Keywords

Please wait a minute...

For Selected:

Download Citations
EndNote Ris BibTeX

Toggle Thumbnails

Select

High-precision entity and relation extraction in medical domain based on pseudo-entity data augmentation

Andi GUO, Zhen JIA, Tianrui LI

Journal of Computer Applications 2024, 44 (2): 393-402. DOI: 10.11772/j.issn.1001-9081.2023020143

Abstract （209）

HTML （3）

PDF （4228KB）（117）

Save

Aiming at the problems of dense knowledge and the propagation of error during entity extraction and relation classification in medical domain， a high-precision entity and relation extraction framework based on pseudo-entity data augmentation was proposed. First， a Transformer-based feature reading unit was added in the entity extraction module to capture category information for accurately identifying medical long entities among dense entities. Second， a relation negative example generation module was inserted into the pipeline extraction framework， pseudo-entities were generated for confusing relation classification model by an under-sampling-based pseudo-entity generation model， and three data augmentation generation strategies were proposed to improve the model’s ability to identify subject-object reversal， subject-object boundary errors， and relation classification errors. Finally， the problem of the sharp increase in training time caused by data enhancement was alleviated by the levitated-marker-based relation classification model. On CMeIE dataset， four mainstream models were compared with the proposed model. For entity extraction tasks， the proposed model improved the F1 value by 2.26% compared with suboptimal model PL-Marker（Packed Levitated Marker）， while for entity relation extraction tasks， the proposed medel improved the F1 value by 5.45% and the precision by 15.62% compared with suboptimal pipeline extraction model proposed by CBLUE （Chinese Biomedical Language Understanding Evaluation）. The experimental results show that using both the feature reading unit and the pseudo-entity data enhancement module can effectively improve the precision of extraction.

Table and Figures | Reference | Related Articles | Metrics